QTL analysis of traits related to seed size and shape in sesame (Sesamum indicum L.)

Seed size and shape are important traits that determine seed yield in sesame. Understanding the genetic basis of seed size and shape is essential for improving the yield of sesame. In this study, F2 and BC1 populations were developed by crossing the Yuzhi 4 and Bengal small-seed (BS) lines for detecting the quantitative trait loci (QTLs) of traits related to seed size and shape. A total of 52 QTLs, including 13 in F2 and 39 in BC1 populations, for seed length (SL), seed width (SW), and length to width ratio (L/W) were identified, explaining phenotypic variations from 3.68 to 21.64%. Of these QTLs, nine stable major QTLs were identified in the two populations. Notably, three major QTLs qSL-LG3-2, qSW-LG3-2, and qSW-LG3-F2 that accounted for 4.94–16.34% of the phenotypic variations were co-localized in a 2.08 Mb interval on chromosome 1 (chr1) with 279 candidate genes. Three stable major QTLs qSL-LG6-2, qLW-LG6, and qLW-LG6-F2 that explained 8.14–33.74% of the phenotypic variations were co-localized in a 3.27 Mb region on chr9 with 398 candidate genes. In addition, the stable major QTL qSL-LG5 was co-localized with minor QTLs qLW-LG5-3 and qSW-LG5 to a 1.82 Mb region on chr3 with 195 candidate genes. Gene annotation, orthologous gene analysis, and sequence analysis indicated that three genes are likely involved in sesame seed development. These results obtained herein provide valuable in-formation for functional gene cloning and improving the seed yield of sesame.


Introduction
Sesame (Sesamum indicum L., 2n = 26) belongs to the Sesamum genus under the Pedaliaceae family and is one of the earliest oilseed crops to be domesticated [1].Sesame seed provides high-quality oil containing a high content of unsaturated fatty acids and natural antioxidants for human consumption, and is also traditionally consumed directly [2,3].The global demand for sesame seeds and derived products is increasing significantly owing to the shift towards healthier and nutritional plant-based foods [4].However, the quantity of sesame produced annually is much lower than that of other oil crops, such as groundnut, rapeseed, and sunflower [5].Therefore, improving the yield of sesame is one of the most important goals of sesame breeding [6].The yield of seeds is significantly affected by seed size and shape; therefore, these factors may have considerable potential for significantly improving the yield of sesame seeds [7].Additionally, the traits related to seed size and shape have higher heritability and better stability than the traits related to seed yield across different environments [8].Therefore, dissecting the genetic basis of seed size and shape will aid in improving the yield of sesame seeds.
Quantitative trait locus (QTL) mapping is an effective approach for dissecting the genetic basis of complex quantitative traits of crops.Genetic map is a prerequisite for QTL mapping and can provide essential information regarding the linkage of genetic markers [9].Several genetic maps of sesame have been constructed in the past decade [10][11][12].However, the genetic dissection of the agronomic traits of sesame has been hindered for a long time owing to lack the markers and genomic information [6].With the release of several physical maps of sesame [13][14][15][16][17][18] and the development of next-generation sequencing techniques, many QTLs/ genes for agronomic traits were identified in sesame using QTL mapping [6,[19][20][21] and genome-wide association study (GWAS) [4,[22][23][24].For QTL mapping, Wu et al. [19] detected 13 QTLs by multiple interval mapping (MIM) and 17 QTLs by mixed linear composite interval mapping (MCIM) for seven grain yield-related traits by a high-density linkage map with 3,769 single-nucleotide polymorphism (SNP) markers.Zhang et al. [20] identified a determinacy gene (SiDt) controlling the determinate growth habit using an ultra-dense map in sesame.Du et al. [21] identified 19 major-effect QTLs for seed-related traits using a high-density genetic map with 2,159 SNP markers.Mei et al. [6] constructed a high-density genetic map with 3,528 specific-locus amplified fragment (SLAF) markers, and identified 46 significant QTLs for seven yield-related traits.For GWAS, Wei et al. [22] identified 549 associated loci for 56 traits in four environments by using GWAS.Cui et al. [23] performed GWAS for the seed coat color of 366 sesame germplasm lines in 12 environments and identified 224 SNPs with 1.01-22.10% of phenotypic variation explained (PVE).Sabag et al. [4] identified 50 signals associated with flowering date and yield-related traits.Dossou et al. [24] detected 17 and 72 SNPs associated with sesamin and sesamolin, respectively, and identified 11 candidate causative genes by GWAS.
Several studies have comprehensively analyzed the traits related to seed size and shape in various crops, and numerous QTLs/genes have been identified in rice [25,26], maize [8,27], wheat [28,29], peanut [30,31], and rapeseed [32,33].However, there is a scarcity of studies on the seed size and shape traits in sesame except for Du et al. [21], and the genetic basis of these traits is poorly understood to date.In order to elucidate the genetic basis of these traits in sesame, we developed a BC 1 and an F 2 population for characterizing the seed size and shape traits in sesame.A high-density genetic map for the BC 1 population was constructed using SLAF and SSR markers, and a genetic map for the F 2 population was constructed using SSR markers.Three important co-localized loci were subsequently identified, harboring the stable major QTLs, which may provide useful information for future breeding strategies aimed at improving the seed yield of sesame.

Plant materials and phenotypic evaluation
An exotic germplasm line Bengal Small-seed (BS) and a locally adapted elite cultivar Yuzhi4 were crossed for developing the BC 1 [6] and F 2 segregation populations.The seed size of the male parent BS was significantly smaller than that of the female parent Yuzhi4.The F 2:3 families were generated by self-pollinating 150 F 2 plants.The F 2 plants and its F 2:3 families were grown in Sanya (SY; N18˚14', E109˚29'), Hainan Province, China, in the winter of 2019 and 2020 (hereafter 2019SY and 2020SY), respectively.A total of 150 BC 1 F 2 families were developed from 150 BC 1 plants, and the BC 1 F 2 families were planted in Pingyu (PY; N32˚59', E1144 2'), Nanyang (NY; N 32˚54 0 , E112˚24 0 ), and Luohe (LH; N 33˚37 0 , E 113˚58 0 ), in Henan Province, China, in the summer of 2018 (hereafter 2018PY, 2018NY, and 2018LH).The traits of the F 2:3 and BC 1 F 2 plants were evaluated instead of the F 2 and BC 1 individuals, respectively, as described in the study by Mei et al. [6].Each of these locations was regarded as a separate environment.The field experiments were performed in randomized complete blocks with two replicates.Each accession was planted with 20 single plants per row with a spacing of 17 cm between the plants and 40 cm between the rows.Ten uniform plants were harvested from the middle of each row at maturity.The mature seeds from these ten plants were mixed together for measuring the seed length (SL), seed width (SW), and length to width ratio (L/W) using an SC-G automatic seed analysis and 1000-grain weight instrument (Hangzhou Wanshen Detection Technology Co., Ltd.China).The L/W was calculated by dividing the SL by the SW [30].The mean values of SL, SW, and L/W were calculated from three independent samples of 1000 seeds for phenotypic characterization.The seed size was represented by SL and SW, and the seed shape was represented by L/W.

Statistical analyses of phenotypic data
The variance and normal distribution of the data were analyzed using the SPSS Statistics software, version 19.0 (IBM Corp., Armonk, NY, USA).The broad-sense heritability (h 2 ) of the traits related to seed size was calculated using the following formula: Where s 2 G ; s 2 GE , and s 2 ε represent the genotypic variance, variance of the interaction between the genotype (G) and the environment (E), and the variance of stochastic error, respectively; n represents the number of environments, and r denotes the number of replicates in each environment.

QTL mapping
One high-density linkage map with 3,294 SLAF markers and 347 SSR markers, and the other genetic map with 166 SSR markers evenly distributed on the genome were constructed for the BC 1 and the F 2 populations, respectively (S1 and S2 Tables).QTL mapping was performed using the IciMapping 4.1 software [34], with the inclusive composite interval mapping additive (ICIM-ADD) model.The significant LOD threshold was determined based on 1000 permutations with a type 1 error of 0.05.The QTLs with similar genomic locations (within 5 cM) and same direction of additive effects in different environments were regarded as the same QTL and designated the same name [6].The QTLs that were identified in more than one environment were regarded as stable QTLs, and QTLs with more than 10% of PVE in at least one environment were regarded as major QTLs [6].A graphical representation of the map was constructed using the Mapchart 2.3 software [35].

Identification of candidate genes for major QTLs
The marker sequences flanking each QTL were aligned against the reference genome of sesame 'Zhongzhi 13 v2.0'[15] for identifying the positions of the QTLs using TBtools software v1.106 [36].The genes were extracted from the mapping intervals, and the gene functions were annotated by comparing the gene sequence with non-redundant protein sequence database.

Phenotype description of seed size and shape traits
The female parent Yuzhi 4 was a large-seed cultivar, while the paternal parent BS was an exotic germplasm line with a significantly smaller seed size (Fig 1A).Compared to that of the Yuzhi 4, which had an average SL, SW, and L/W of 3.24 mm, 1.76 mm, and 1.84, respectively, the seed size of the BS line was smaller, with an average SL, SW, and L/W of 2.32 mm, 1.45 mm, and 1.56, respectively (Fig 1B).The traits related to seed size and shape of the F 2 and BC 1 populations were measured, and the descriptive statistics are provided in Table 1.The SL, SW, and L/W of the F 2 population ranged from 2.20 to 3.00 mm, 1.32 to 1.76 mm, and 1.61 to 1.80, respectively, and the mean values were 2.63 mm, 1.56 mm, and 1.69, respectively, across two different environments.The SL, SW, and L/W of the BC 1 population ranged from 2.52 to 3.22 mm, 1.47 to 1.78 mm, and 1.62 to 1.91, respectively, and the mean values were 2.87 mm, 1.62 mm, and 1.78, respectively, across three different environments (Table 1).Analysis of the skewness and kurtosis indicated that the three traits followed a normal distribution pattern in the two populations.The results of the analysis of variance indicated that the effects of genotypes (G) and environments (E) were highly significant in the BC 1 population (P < 0.01); however, the interaction between G and E (G × E) did not significantly affect the SL, SW, and L/W across the three environments (S3 Table ).The broad-sense heritability of the three traits was relatively high, ranging from 0.73 to 0.88 (S3 Table ).Furthermore, for each of the traits, a significantly positive correlation of the phenotypic values between each of environments in the BC 1 population.These findings indicated that genetics plays a major role in regulating the size and shape of sesame seeds.The results of phenotypic correlation indicated that the SL was significantly positively correlated with the SW (r = 0.87, 2019SY; r = 0.86, 2020SY; P < 0.001) and L/W (r = 0.42, 2019SY; r = 0.47, 2020SY; P < 0.001) in the F 2 population; however, no significant correlation was found between the SW and L/W.The SL was significantly positively correlated with the SW and L/W in the BC 1 population, and the correlation coefficients were 0.76-0.81and 0.45-0.60(P < 0.001), respectively, across the three environments.The SW was significantly negatively correlated with the L/W in 2018LH environment (r = -0.16,P < 0.05), while no significant correlation was found in the other two environments (S4 Table ).

QTLs detected in the BC 1 population
A high-density linkage map with 3,294 SLAF markers and 347 SSR markers was constructed for the BC 1 population (S1 Table ).This map contained 13 linkage groups (LGs), and covered a total of 1266.87 cM genetic distance.A total of 39 QTLs, distributed on 11 LGs, were detected for SL, SW, and L/W in the BC 1 population across the three environments (Fig 2 and Table 2).The PVE of the QTLs ranged from 1.36% to 39.36%, with LOD scores ranging from 2.85 to 33.74.Of the 39 QTLs, 11 were identified as major QTLs which explained more than 10% of the phenotypic variation in at least one environment; and 4, 3, and 32 QTLs were identified in three, two, and one environment, respectively.A total of 14 QTLs for SW were mapped on the LG1, LG2, LG3, LG4, LG5, LG7, LG8, LG10, LG12, and LG13, which individually explained 1.44-25.29% of the phenotypic variations, and had LOD scores ranging from 2.92% to 31.77%.Of these 14 QTLs, one major QTL, qSW-LG8, which had a PVE of 6.17-14.47%,was detected in two environments, and four major QTLs, including qSW-LG3-2, qSW-LG4, qSW-LG13-1, and qSW-LG13-2, were identified in one environment, and explained 10.93%, 11.45%, 25.29%, and 15.64% of the phenotypic variations, respectively.

QTLs identified in the F 2 population
A genetic map was constructed using 166 SSR markers for the F 2 population.Thirteen QTLs were identified in the 2019SY and 2020SY environments, including 5, 4, and 4 QTLs for SL, SW, and L/W, respectively (S1 Fig and S5 Table ).The PVE of the QTLs for SL, SW, and L/W ranged from 6.26% to 20.94%, 4.72% to 21.37%, and 6.71% to 21.64%, respectively.Of these QTLs, the PVE of eight QTLs was higher than 10% in at least one environment, and six QTLs were detected in both environments.The QTLs qSL-LG3-F 2 and qSL-LG13-F 2 were identified for SL, which explained 11.05-13.27%and 6.84-10.42% of the phenotypic variations, respectively, in the two environments.The QTL qSL-LG10-F 2 for SL explained 20.94% of the phenotypic variation in 2020SY environment.The QTL qSW-LG3-F 2 was identified for SW, which had a PVE of 14.88-16.35% in the two environments.The QTL qSW-LG10-F 2 for SW explained 21.37% of the phenotypic variation in 2020SY environment.The QTL qLW-LG6-F 2 for the L/W in the two environments explained 4.96-21.64% of the phenotypic variations.The QTL qLW-LG7-F 2 for L/W had a PVE of 12.74% in 2019SY environment, and the QTL qLW-LG11-F 2 for L/W had a PVE of 11.54% in 2020SY environment.

Pleiotropic QTLs and co-localized loci
The QTLs for different traits that were located in close vicinity (< 5 cM) or in identical regions were regarded as pleiotropic QTLs.After integrating the two genetic maps, a total of 13 co-   and two minor QTLs qLW-LG5-3 and qSW-LG5.The major qSL-LG5 with PVE of 8.52-10.52%was consistently identified in three environments, while the minor QTLs qLW-LG5-3 and qSW-LG5 were detected in two and one environments, respectively.The locus9-1 was located in 64.43-87.31cM on LG6 in the linkage map and spanned 3.27 Mb with flanking markers HSRC3275 and Marker1958174 on chr9.The locus9-1 harbored the major QTLs qSL-LG6-2, qLW-LG6, and qLW-LG6-F 2 that were identified in three, three, and two environments, respectively, and explained 8.14-39.36% of the phenotypic variations.

Functional annotation of the three co-localized QTL regions
In order to identify the genes that potentially regulated seed size and shape, a total of 872 genes located in the three important co-localized loci were identified, including genes that encoded transcription factors, enzymes, and transporters (S7 Table ).Gene annotation and orthologous gene analysis indicated that six genes, SIN_1008192, SIN_1015854, SIN_1015831, SIN_1024196, SIN_1024145, and SIN_1015130, were possibly involved in the development of sesame seeds.Furthermore, sequence analysis detected one non-synonymous SNP in the coding region of SIN_1015854, two non-synonymous SNPs in the coding region of SIN_1015831, and one non-synonymous SNP in the coding region of SIN_1015130 between Yuzhi 4 and BS (S2-S5 Figs).SIN_1015854 was annotated as AUXIN-REGULATED GENE INVOLVED IN ORGAN SIZE (ARGOS), which is a positive regulator of organ size in plants [37,38].
SIN_1015831 is an ortholog of the HMG1 of Arabidopsis and encodes 3-hydroxy-3-methylglutaryl coenzyme A reductase, which is involved in sterol biosynthesis [39].The SIN_1015130 was annotated to encode the E3 ubiquitin-protein ligase RHF2A, which plays an important role in gametogenesis in Arabidopsis [40].

Discussion
The size and shape of seeds are critical traits of crops, which play important roles in determining the yield of seeds, and adaption to certain environments [7,41].The complex genetic basis of seed size and shape, which is regulated by several genes involved in various pathways, has been clearly elucidated in model plants.However, it is largely unknown in sesame.The present study revealed that there are substantial genetic variations in seed size (SL and SW) and shape (L/W) among the F 2 and BC 1 populations.QTL mapping was conducted to identify 13 QTLs in the F 2 population in two environments and 39 QTLs in the BC 1 population in three environments.Many more QTLs were identified in the BC 1 than in the F 2 population, maybe it is because many more markers provide enough information for the BC 1 to identify more QTLs than in the F 2 population.The significant cross-environment correlations and the high heritability of the three traits within the BC 1 population indicated that stable QTLs can be identified in different environments.Generally, stable QTLs are defined as those that are consistently detected across different environments, and are of great value for marker-assisted breeding in varieties adapted to various ecological environments [42].In this study, five major stable QTLs for SL, two for SW, and two for L/W were consistently identified in at least two environments.Furthermore, three major QTLs in the F 2 population were verified by QTLs detected in the BC 1 population.qSW-LG3-F 2 in the F 2 population was mapped to a region of  (11.63-15.16Mb on chr13) in the F 2 population were overlapped with qLW-LG6 (6.01-7.65 Mb on chr9) and qSL-LG10-1 (14.81-14.82Mb on chr13) in BC 1 population, respectively.These stable major QTLs (qSW-LG3-F 2 /qSW-LG3-2, qLW-LG6-F 2 /qLW-LG6, and qSL-LG10-F 2 /qSL-LG10-1) in two different populations implied the reliability of the QTLs in this study and the importance of these regions in the genetic improvement of seed size and shape in sesame.Three co-localized loci (locus1-3, locus3-1, and locus9-1) were identified as important owing to the presence of stable major QTLs for seed size and shape with high PVEs and good stabilities.The QTL qSL-LG3-2 was located in the interval of 18.35-18.67Mb on chr1, which co-located with qSW-LG3-2 (18.42-18.48Mb) and qSW-LG3-F 2 .This colocalized locus, spanning 2.08 Mb (16.59-18.67Mb) on chr1, offered a high level of contribution to PVE by these three QTLs, i.e., 4.94-13.52%for SL across three environments, 10.93% for SW in one environment, and 14.88-16.35%for SW across two environments.Another colocalized locus was mapped in a 1. 4   .This co-localized locus offered a high level of contribution to PVE by these three QTLs, i.e., 8.14-12.17%for SL across three environments, 25.37-39.36%for L/W across three environments, and 9.88-21.64%for L/W across two environments.In conclusion, these three important co-localized loci were associated with stable major QTLs for SL, SW, and L/W.The application of genetic markers in these loci to breeding programs can potentially optimize the selection of multiple traits related to sesame seed size and shape.
Several QTLs related to seed weight have been identified in previous studies by linkage mapping or association analysis [6,19,21,43].The seed weight is considerably affected by the seed size and shape.In order to determine the genetic relationships between seed weight and seed size/shape at the individual QTL level, we compared the physical genomic locations of the QTLs identified in this study with QTLs for seed weight in previous studies.The interval regions of locus1-3 (harboring qSL-LG3-2, qSW-LG3-2, and qSW-LG3-F 2 ), locus3-1 (harboring qSL-LG5, qSW-LG5, and qLW-LG5-3), locus4-1 (harboring qSL-LG8-2 and qSW-LG8-2), and locus13-1 (harboring qSW-LG10-3 and qLW-LG10-F 2 ) overlapped with those of qSW_LG03, qSW_LG05-2, qSW_LG08-1, and qSW_LG10, respectively, reported in the study by Mei et al. [6].The QTLs qSL-LG6-1 and qSL-LG6-F 2 were located in 17.54-17.67Mb and 19.63-22.67Mb, respectively, and overlapped with the QTL Qtgw-11 detected by Wu et al. [19].The results indicated that the seed size/shape strongly influenced the seed weight at the QTL level, and these QTLs should be selected for marker-assisted selection in breeding programs for improving the yield of sesame by increasing the seed weight.Based on the above results, candidate genes and causative sites for these important traits will be identified by QTL fine-mapping or GWAS.With more genes that underlie quantitative traits identified, navigation breeding will be applied in sesame, which has been successfully used in rice [44].
A total of 872 genes, located in the three important co-localized QTL regions, were identified in this study.Gene annotation, orthologous gene analysis, and sequence analysis revealed that SIN_1015854, SIN_1015831, and SIN_1015130 are likely related to sesame seed size and shape.The gene SIN_1015854 was annotated as ARGOS, which is highly induced by auxin and partakes in regulating organ size in Arabidopsis [37].The overexpression of ARGOS or BrAR-GOS in Arabidopsis leads to the development of larger organs owing to enhanced cell proliferation [37,45].Wang et al. [38] also reported that the overexpression of the OsARGOS gene of rice in Arabidopsis increases the size of organs by increasing the number and size of cells.The gene SIN_1015831 encodes an HMG1 protein, which is an important enzyme in the mevalonate pathway of sterol biosynthesis [39].Mutations in the hmg1 gene resulted in dwarfism and short siliques owing to reduced cellular elongation resulting from low sterol levels [39].SIN_1015130 is an ortholog of the RHF2A of Arabidopsis and encodes an E3 ubiquitinprotein ligase.A previous study demonstrated that a rhf1arhf2a double mutant developed short siliques and exhibited reduced fertility, resulting from defective gametophyte formation due to mitotic cell cycle arrest [40].Altogether, the findings of the present study lay a foundation for further fine mapping and map-based cloning of these major QTLs for seed size and shape in sesame.

Conclusion
To understand the genetic basis of seed size and shape in sesame, we developed a high-density genetic map for the BC 1 population with 3,294 SLAF markers and 347 SSR markers, and a genetic map for the F 2 population with 166 SSR markers.The sesame seed size (SL and SW) and shape (L/W) of the F 2 and BC 1 populations were measured, and the broad-sense heritability of the three traits ranged from 0.73 to 0.88.QTL mapping showed that 52 QTLs, including 13 in F 2 and 39 in BC 1 populations, for the three traits were identified and explained phenotypic variations from 3.68 to 21.64%.After integrating the two genetic maps, thirteen co-localized loci were identified.It is worth noting that three co-localized loci locus1-3, locus3-1, and locus9-1 harboring stable major QTLs were identified.Finally, three candidate genes in the three loci which are likely related to sesame seed size and shape were identified.These results will provide new insights into the genetic basis of seed size and shape, and useful information for breeding strategies to improve the seed yield of sesame.

Fig 3 .
Fig 3. QTLs for traits related to seed size and shape detected in F 2 and BC 1 populations.The black bar on each LG column indicates a marker.The QTLs for F 2 and BC 1 populations are shown in black and red, respectively.https://doi.org/10.1371/journal.pone.0293155.g003

Table 1 . Phenotypic variation in the seed size and shape of the two populations across different environments.
Note: SD, standard deviation; CV, coefficient of variation.https://doi.org/10.1371/journal.pone.0293155.t001